'•'>  y.-.-  1 


AD-A277  201 


$ 


VISUAL  EVALUATION  OF  COMPUTER-GENERATED  TEXTURES 


George  A  Geri 
Don  R.  Lyon 
Yehoshua  Y.  Zeevi 


University  of  Dayton  Research  Institute 
300  College  Park  Avenue 


Dayton,  OH  45469-0110 


DTIC 


SELECTE  n 
MAR  2  2 19941  I 


HUMAN  RESOURCES  DIRECTORATE 
AIRCREW  TRAINING  RESEARCH  DIVISION 
6001  S.  Power  Road,  Bldg  558 
Mesa,  AZ  85206-0904 


January  1994 

Final  Technical  Report  for  Period  October  1990-  March  1993 


Approved  for  public  release;  distribution  is  unlimited. 


94-08954 


T1TTC  Q?,Vv1 


AIR  FORCE  MATERIEL  COMMAND 
BROOKS  AIR  FORCE  BASE,  TEXAS 


I 


|1  3  21  0  s  4 


* 


NOTICES 


This  technical  report  is  published  as  received  and  has  not  been  edited  by  the 
technical  editing  staff  of  the  Armstrong  Laboratory. 

When  Government  drawings,  specifications,  or  other  data  are  used  for  any 
purpose  other  than  in  connection  with  a  definitely  Government-related  procure¬ 
ment,  the  United  States  Government  incurs  no  responsibility  or  any  obligation 
whatsoever.  The  fact  that  the  Government  may  have  formulated  or  in  any  way 
supplied  the  said  drawings,  specifications,  or  other  data,  is  not  to  be  regarded  by 
implication,  or  otherwise  in  any  manner  construed,  as  licensing  the  holder,  or  any 
other  person  or  corporation;  or  as  conveying  any  rights  or  permission  to 
manufacture,  use.  or  sell  any  patented  invention  that  may  in  any  way  be  related 
thereto. 

The  Office  of  Public  Affairs  has  reviewed  this  report,  and  it  is  reieasable  to  the 
National  Technical  Information  Service,  where  It  will  be  available  to  the  general 
public,  including  foreign  nationals. 

This  report  has  been  reviewed  and  is  approved  for  publication. 

ELIZABETH  L.  MARTIN  DEE  H.  ANDREWS,  Technical  Director 

Project  Scientist  Aircrew  Training  Research  Division 


Chief, 


RROLL,  Colonel,  USAF 
w  Training  Research  Division 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


Publk  reporting  burden  for  this  collection  of  information  is  estimated  to  average  i  Hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources, 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  th«s 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services.  Directorate  for  information  Operations  and  Reports.  1215  Jefferson 
Davis  Highway.  Suite  1204,  Arlington.  VA  22202-4302.  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704*0188),  Washington.  DC  20503 


1.  AGENCY  USE  ONLY  (Leave  blank) 


4.  TITLE  ANO  SUBTITLE 


2.  REPORT  DATE 

January  1994 


Visual  Evaluation  of  Computer-Generated  Textures 


6.  authorise 
George  A.  Gen 

Don  R.  Lyon 
Yehoshua  Y.  Zeevi 


7.  PERFORMING  ORGANIZATION  NAME(S)  ANO  ADORESS(ES) 

University  of  Dayton  Research  Institute 
300  College  Paik  Avenue 
Dayton,  OH  45469-0110 


9.  SPONSORING  /MONITORING  AGENCY  NAME(S)  ANO  ADDRESS(ES) 

Armstrong  Laboratory  (AFMC) 

Human  Resources  Directorate 
Aircrew  Training  Research  Division 
6001  S.  Power  Road,  Bldg  558 
Mesa,  AZ  85206-0904 


11.  SUPPLEMENTARY  NOTES 

Armstrong  Laboratory  Technical  Monitor:  Dr.  Elizabeth  L.  Martin,  (602)  988-6561. 


3.  REPORT  TYPE  ANO  OATES  COVERED 

Final  October  1990  -  March  1993 


5.  FUNDING  NUMBERS 

C  -  F3361 5-90-C-0005 
PE  -  62205F 
PR  -  1123 
TA  -  03 
WU-  85 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSORING /MONITORING 
AGENCY  REPORT  NUMBER 


AL/HR-TR-1 993-01 89 


12a.  DISTRIBUTION /AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  is  unlimited. 


13.  ABSTRACT  (Maximum  200  words) 

Textures  generated  by  superimposing  sinusoidal  luminance  distributions  can  be  used  to  simulate  the  natural  terrain 
textures  often  used  in  flight  simulator  imagery.  Since  the  visual  system  is  spatially  inhomogeneous  with  the  periphery 
being  generally  less  sensitive  than  the  center  of  the  visual  field,  simpler,  more  easily  generated  textures  can  potentially  be 
used  to  simulate  terrain  that  is  farther  from  the  operator's  point  of  regard.  The  minimal  number  of  component  sinusoids 
required  to  generate  textures  that  are  visually  acceptable  in  the  visual  periphery  was  estimated  for  the  discrimination  of 
complex  suprathreshold  textures.  Specifically,  similarity  ratings  were  obtained  to  determine  the  effects  of  component 
orientation  and  component  phase-bandwidth  on  the  cortical  magnification  factor  (CMF)  associated  with  that  discrimination. 
The  textures  were  designed  to  be  both  specifically  by  a  relatively  small  number  of  localized  spectral  components  and 
sufficiently  complex  to  approximate  natural  images.  The  number  of  component  orientations  was  found  to  be  a  particularly 
important  determinant  of  texture  discrimination  in  that  its  effect  on  rated  similarity  was  largely  independent  of  the  total 
number  of  components  making  up  the  texture.  When  the  number  of  components  was  varied,  a  CMF  of  2  was  sufficient  to 
equate  the  similarity  ratings  obtained  at  0.75°  and  20°.  Under  the  same  conditions,  a  CMF  of  4  clearly  overcorrected  the 
data.  The  estimated  CMF  for  texture  discrimination  is  much  smaller  than  that  found  for  the  discrimination  of  simple  2-D 
spatial  frequency  and  suggests  that  either  quantitatively  different  cortical  mechanisms  or  different  cortical  areas  are 
responsible  for  the  two  types  of  discrimination.  When  the  phase-bandwidth  of  the  component  textures  was  varied,  the 
data  were  not  adequately  corrected  by  the  same  CMF  of  4  that  overcorrected  the  component  orientation  data. 


14.  SUBJECT  TERMS 

Gabor  functions  F 

Orientation  T 

Perception  V 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 

Unclassified 


NSN  7540-01  -280-5500 


Phase 

Textures 

Vision 


Vision  models 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 

Unclassified 


19.  SECURITY  CLASSIFICATION 
OF  ABSTRACT 

Unclassified 


15.  NUMBER  OF  PAGES 

46 


16.  PRICE  CODE 


20.  LIMITATION  OF  ABSTRACT 


Standard  Form  298  (Rev  2-89) 

Prescribed  by  ANSf  Std  239-18 
298  102 


CONTENTS 


Page 

SUMMARY .  1 

INTRODUCTION .  1 

Generating  Images  Using  Spectral  Components . 1 

The  Cortical  Magnification  Factor  (CMF) . 2 

Component  Orientation  and  Phase-Bandwidth . 3 

Modeling  Complex  Texture  Discrimination .  4 

METHODS .  5 

Observers .  5 

Apparatus .  5 

Stimuli .  7 

Procedure .  14 

Data  Analysis .  15 

RESULTS .  15 

Number  of  Orientations .  15 

Phase-Bandwidth.'. .  18 

DISCUSSION . , .  22 

Complex  Imagery  from  Spectral  Components .  22 

Salience  of  Component  Orientation .  23 

CMF  for  Suprathieshold  Textures .  24 

Orientation-Components  Stimuli .  24 

Phase-Bandwidth  Stimuli .  25 

Relevance  to  Simple  Models  of  Texture  Segregation .  26 

REFERENCES .  32 


List  of  Figures 


Figure 

No,  Page 

1  A  Schematic  Diagram  of  the  Apparatus  Used  in  the  Presort  Study .  6 

2a  A  Typical  Texture  Set  Used  in  the  Orientation  Portion 

of  the  Present  Study .  8 

2b  The  Component  Distribution  for  Each  Member  of  the  Texture 

Set  Shown  in  Figure  2a .  9 

3  The  Distribution  in  Phase  Space  of  the  Components 

of  a  Hypothetical  24-Component  Texture .  11 

4a  A  Typical  Texture  Set  Used  in  the  Phase-Bandwidth  Portion 

of  the  Present  Study .  12 

4b  The  Phase  Bandwidth  Associated  with  Each  Member  of  the 

Texture  Set  of  Figure  4a .  13 

5  Similarity  Ratings  as  a  Function  of  the  Number  of  Components  in  Three 

Sets  of  Textures  Composed  of  Either  8,  6,  or  4  Spatial  Frequencies .  16 

6  Similarity  Ratings  for-  Each  of  the  Three  Observers  as  a  Function  of 

the  Number  of  Orientations  (#ORs)  Making  Up  the  Stimulus  Textures .  17 

7  Similarity  Ratings  for  Two  Observers  as  a  Function  of  the  Number 

of  Spatial  Frequencies  (#SFs)  Making  Up  the  Stimulus  Textures .  19 

8a  Similarity  Ratings  as  a  Function  of  Phase-Bandwidth 

Data  for  Observer  GG .  20 

8b  Similarity  Ratings  as  a  Function  of  Phase-Bandwidth 

Data  for  Observer  LK .  21 

9  Examples  of  the  Spatial  Filters  Used  to  Analyze  the  Texture  Stimuli .  28 

10  Examples  of  Spatial  Filters  of  the  Same  Spatial  Frequency  (2cpi) 

but  Differing  in  Orientation  by  90  Degrees .  29 


IV 


I 


PREFACE 

The  present  effort  was  conducted  in  support  of  the  Armstrong  Laboratory/ Aircrew 
Training  Research  Division  (AL/HRA)  under  Work  Unit  1 123-03-85,  Flying  Training  Research 
Support,  and  by  Air  Force  Contract  F33615-90-C-0005  with  the  University  of  Dayton  Research 
Institute.  The  laboratory  contract  monitor  was  Ms.  Patricia  A.  Spears;  task  monitor  was  Dr. 
Byion  J.  Pierce. 

The  authors  thank  Chris  Voltz  for  the  image  presentation  and  data  collection  software  and 
Dr.  David  C.  Hubbard  for  assistance  with  the  statistical  analysis.  Portions  of  this  work  have 
been  presented  at  annual  meetings  of  the  Association  for  Research  in  Vision  and  Ophthalmology 
(Geri,  Lyon,  &  Zeevi,  1989,  1990). 


v 


VISUAL  EVALUATION  OF  COMPUTER-GENERATED  TEXTURES 


SUMMARY 

Many  natural  images,  and  in  particular,  natural  textures,  can  be  efficiently  generated  by 
adding  together  simple,  sinusoidal  luminance  distributions.  There  are  at  least  two  advantages 
to  this  approach.  First,  the  number  of  sinusoids  required  to  generate  a  given  texture  is  usually 
much  less  than  the  number  of  gray-levels  required  to  specify  a  given  texture  on  a  point-by-point 
basis  as  is  required,  for  instance,  in  a  raster  display  system.  Second,  since  the  visual  system 
is  not  equally  sensitive  at  all  points  in  the  visual  field,  it  is  often  possible  to  generate  an  image 
more  efficiently  if  visual  information  is  distributed  in  accordance  with  this  sensitivity. 
Sinusoidal  components  can  easily  be  modified  so  that  they  are  spatially  localized  (the  result  is 
called  a  Gabor  function),  and  as  such  they  can  be  added  together  independently,  and  as  required, 
at  various  locations  in  the  image.  In  the  present  study,  various  complex  textures  were  generated 
using  Gabor  functions  of  various  spatial  frequencies  and  orientations.  These  textures  were  then 
compared  to  textures  generated  using  successively  fewer  Gabor  functions  in  order  to  determine 
the  minimal  number  of  components  necessary  to  generate  a  reduced  texture  that  was  visually 
similar  to  the  original.  We  found  that  the  number  of  orientational  components  was  the  most 
important  determinant  of  texture  discrimination  in  that  its  effect  on  the  rated  similarity  of  the 
textures  was  not  dependent  on  the  total  number  of  components.  We  also  found  that  in  the  case 
of  properly  constructed,  wide-field  imagery,  between  one-half  and  three-quarters  of  the 
components  could  be  removed  without  significantly  affecting  the  visual  quality  of  the  texture. 


INTRODUCTION 

Generating  Images  Using  Spectral  Components 

Any  periodic  signal,  including  visual  images,  can  be  represented  by  an  appropriately 
chosen  series  of  weighted  sinusoids.  Natural  images  are  often  spatially  periodic  and  hence  are 
well  represented  in  this  way.  Natural  images  are  also  often  spatially  redundant  and  can  usually 
be  represented  by  many  fewer  components  than  the  number  of  elements  required  to  generate  the 
image  point-by-point  (cf.  Geri,  Zeevi  &  Porat,  1990).  A  spectral  representation  of  images  is 
also  consistent  with  the  postulated  function  of  the  visual  system,  which  is  believed  to  perform 
a  Fourier-like  analysis  of  visual  stimuli  (cf.  Braddick,  Campbell  &  Atkinson,  1978). 

The  concept  of  spectrally  selective  visual  mechanisms  was  initially  developed  to  explain 
threshold  psychophysical  data,  and  the  most  successful  of  the  models  incorporating  this  concept 
have  confined  themselves  to  predicting  such  data  (Georgeson  &  Harris,  1984;  Graham  & 
Nachmias,  1971;  Watson,  1982;  Wilson  &  Bergen,  1979).  There  have,  however,  been 
numerous  attempts  to  incorporate  these  mechanisms  into  models  of  suprathreshold  discrimination 
(Cannon  &  Fullenkamp,  1988;  Quick,  1974;  Thomas,  1985;  Watson,  1983),  and  even  the 
discrimination  of  complex,  real-world  images  (Caelli,  1982).  Aside  from  the  well-documented 
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problems  associated  with  modeling  nonlinear  responses,  there  are  two  major  difficulties  in 
applying  the  visual  mechanism  concept  to  complex,  suprathreshold  imagery.  The  first  problem 
is  stimulus  control.  It  is  difficult  to  generate  complex  stimuli,  which  both  approximate  real- 
world  imagery,  and  whose  spatial  frequency  and  orientation  content  are  easily  specified.  It  may 
be  possible  to  reach  a  compromise,  however,  between  the  requirements  for  controlled  stimuli 
on  the  one  hand  and  realistic  stimuli  on  the  other  by  using  stimuli  that  are  composed  of  selected, 
discrete,  spectral  components.  Stimuli  of  this  kind  can  be  well  specified  both  spatially  and 
spectrally  and  can  be  used  to  approximate  natural  images  and  other  images  with  continuous 
power  spectra  (Field,  1987;  Kronauer,  Daugman  &  Zeevi,  1982;  Porat  &  Zeevi,  1989).  The 
second  problem  associated  with  applying  the  visual  mechanism  concept  to  complex, 
suprathreshold  imagery  is  that  it  is  often  difficult  to  predict  how  component  spatial  frequency 
or  orientation  will  manifest  itself  in  complex  stimuli.  This  has  proven  to  be  a  problem  even 
when  relatively  simple  one-dimensional  gratings  are  used  (Badcock,  1984;  Lawton,  1984).  In 
the  present  study  a  higher-order  perceptual  response  (similarity  rating)  was  obtained  to 
suprathreshold  synthetic  textures  composed  of  multiple  spectral  components.  While  this 
approach  makes  it  more  difficult  to  relate  the  data  to  early  visual  mechanisms,  it  may  reveal 
higher  order  aspects  of  the  perception  of  visual  texture  which  are  not  predictable  from  threshold 
data. 


Assuming  that  the  perception  of  2-D  spatial  frequency  (i.e.,  spatial  frequency  (SF)  and 
orientation  (OR))  is  mediated  by  a  relatively  small  number  of  overlapping  mechanisms,  it  is 
conceivable  that  one  of  these  mechanisms  could  be  overstimulated  in  the  sense  that  adding 
another  component  within  the  bandwidth  of  the  mechanism  could  result  in  a  relatively  smaller 
change  in  the  mechanism’s  output  and  hence  an  overall  smaller  perceptual  difference  in  the 
stimulus.  If  this  is  true,  then  adding  many  components  within  a  small  frequency  or  orientation 
range  may  be  an  inefficient  representation  of  the  image  in  that  a  better  selected  distribution  of 
components  may  give  an  adequate  representation  of  the  global  properties  of  the  image.  If  this 
is  true,  then  at  some  point  a  density  of  sampling  in  spatial  frequency  or  orientation  will  be 
reached  at  which  the  efficiency  of  stimulation  of  the  mechanisms  by  each  additional  component 
will  no  longer  be  maximal. 

One  goal  of  the  present  study  was  to  determine  the  minimum  number  of  localized 
sinusoidal  components  necessary  to  preattentively  represent  a  complex  image  composed  of  64 
components.  This  number  was  chosen  as  a  compromise  between  the  resolution  attainable  for 
a  moderately  sized  image  presented  on  available  visual  displays  and  the  requirement  that  the 
textures  reflect  at  least  some  of  the  complexities  inherent  in  natural  scenes  (Porat  &  Zeevi, 
1989). 


The  Cortical  Magnification  Factor  ( CMF ) 

It  is  well  established  that  the  visual  field  is  nonhomogeneously  mapped  onto  the  visual 
cortex  such  that  more  cortical  area  is  devoted  to  processing  stimuli  that  are  nearer  to  the  center 
of  the  visual  field  (Daniel  &  Whitteridge,  1961;  Talbot  &  Marshall,  1941;  Van  Essen,  Newsome 
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St  Maunsell,  1984).  This  fact,  along  with  the  observation  that  performance  on  most  visual  tasks 
declines  as  a  function  of  retinal  eccentricity  (cf.,  Raninen  St  Rovamo,  1986),  Rovamo  St  Virsu, 
1979)  has  led  to  the  concept  of  a  cortical  magnification  factor  (CMF).  It  is  presumed  that  the 
level  of  performance  on  a  visual  task  is  related  to  the  cortical  area  devoted  to  that  task,  and  thus, 
that  performance  at  different  retinal  eccentricities  can  be  equated  by  scaling  the  size  of  die  more 
peripheral  stimulus  by  the  CMF. 

Human  foveal  sensitivity  requires  that  very  high  resolution  displays  be  used  to  present 
imagery  at  the  point  of  fixation.  Since  it  is  beyond  current  technology  to  generate  and  display 
such  high  resolution  over  the  entire  visual  field,  some  form  of  variable-resolution  technique  must 
be  used  to  efficiently  simulate  the  visual  environment  (Zeevi,  Porat  &  Geri,  1990).  The  CMF 
is  thus  particularly  relevant  to  visual  simulation  in  that  it  specifies  the  required  distribution  of 
visual  information  as  a  function  of  distance  from  the  fixation  point.  In  the  present  study  we 
have  estimated  a  CMF  for  the  discrimination  of  complex  textures  similar  in  appearance  to  those 
used  in  many  flight  simulators  (Schachter,  1980,  1983).  The  textures  used  here  have  the  added 
advantage  that  they  are  composed  of  spectral  components  and  hence  can  be  readily  incorporated 
in  full  gray-scale  imagery  using  image  generation  techniques  that  are  both  mathematically 
complete  and  computationally  efficient  (Porat  &  Zeevi,  1988;  Zeevi  &  Gertner,  1992). 

The  CMF  associated  with  certain  positional  judgments  such  as  vernier  acuity  (Klein  & 
Levi,  1987;  Levi,  Klein  &  Aitsebaomo,  1985)  and  phase  discrimination  (Bennett  &  Banks,  1991; 
Hess  &  Pointer,  1987)  is  typically  greater  than  would  be  expected  based  on  the  anatomical  and 
neurophysiological  data.  This  has  led  to  the  suggestion  that  the  visual  periphery  is  inherently 
insensitive  to  positional  relationships  due  either  to  its  limited  sampling  capability  (Snyder,  1982) 
or  to  a  deficit  in  neural  processing  (Levi  &  Klein,  1985;  Virsu,  Nasanen  &  Osmoviita,  1987). 
In  order  to  further  assess  the  visual  response  to  position-related  characteristics  of  complex 
stimuli,  we  have  also  estimated  a  CMF  for  the  discrimination  of  changes  in  structural  coherence 
associated  with  increases  in  the  phase-bandwidth  of  multicomponent  textures. 


Component  Orientation  and  Phase-Bandwidth 

Among  the  most  salient  features  of  psychophysically  defined  visual  mechanisms  is  their 
selectivity  for  spatial  frequency  and  orientation  (Campbell  &  Robson,  1968;  Carpenter  & 
Blakemore,  1973;  Graham  &  Nachmias,  1971).  These  features  are  also  conspicuous  in  the 
response  properties  of  individual  neurons  at  many  levels  in  the  visual  system  (Enroth-Cugell  & 
Robson,  1966;  Gattass,  Sousa  &  Covey,  1985),  as  well  as  in  the  topography  of  the  visual  cortex 
(Hubei  &  Wiesel,  1962,  1968).  Daugman  (1980,  1984)  has  emphasized  the  canonical 
relationship  between  spatial  frequency  and  orientation  and  the  importance  of  considering  both 
of  these  variables  in  assessing  the  inherently  two-dimensional  response  properties  of  visual 
receptive  fields.  These  principles  are  now  generally  accepted,  and  hence  in  most  recent  studies, 
no  preference  is  accorded  to  either  spatial  frequency  or  orientation.  However,  these  studies 
usually  involve  threshold  responses  to  relatively  simple  stimuli.  It  has  proven  more  difficult  to 
apply  simple  neural  models  to  the  discrimination  of  more  complex  stimuli  (Caelli,  1982;  Klein 
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&  Tyler,  1986;  Malik  Sc.  Perona,  1990),  and  such  models  certainly  cannot  be  expected  to 
quantitatively  predict  the  data  obtained  from  higher  order,  or  more  cognitive,  visual  tasks.  It 
might  therefore  be  worthwhile  in  this  context  to  determine  the  degree  to  which  higher  order 
perceptual  responses  to  complex  stimuli  reflect,  at  least  qualitatively,  the  properties  of  putative 
two-dimensional  spatial  frequency  mechanisms. 

The  ability  to  discriminate  both  spatial  frequency  and  orientation  is  known  to  decrease 
with  retinal  eccentricity,  and  this  decrease  can  usually  be  compensated  by  applying  an 
arpropriste  CMF  (Johnston,  1987;  Nothdurft,  1985;  Rovamo,  Virsu  &  Nas&nen,  1978;  Scobey, 
1982).  However,  stimulus  characteristics  that  might  appear  related  to  spatial  frequency  and 
orientation  are  often  processed  differently  and  may  even  result  in  different  estimates  of  the 
CMF.  For  instance,  Nothdurft  (1985)  found  that  die  minimum  line  orientation  that  could  be 
discriminated  was  generally  smaller  than  the  orientation  difference  required  to  discriminate 
adjacent  textures.  Also,  the  CMFs  estimated  by  Scobey  (1982)  for  a  line  orientation  task  were 
less  than  one- half  those  estimated  by  Paradiso  and  Carney  (1988).  Thus  it  is  difficult  to 
generalize  estimated  CMFs  even  from  one  relatively  simple  visual  discrimination  to  another. 
Given  that  more  than  one  component  process  may  be  involved  in  discriminating  complex  stimuli, 
and  given  that  more  complex  discriminations  may  be  mediated  by  higher  (and  progressively 
more  integrative)  cortical  areas,  it  might  be  expected  that  the  discrimination  of  simple  and 
complex  stimuli  would  involve  different  CMFs  (cf.  Gattass  et  al.,  1985;  Levi  et  al.,  1985). 

In  the  present  study  the  number  of  component  orientations  in  a  set  of  synthetic  textures 
was  varied  in  order  to  determine  whether  the  perceptual  salience  of  this  parameter,  which  has 
been  noted  in  previous  studies  (cf..  Beck,  1982;  Julesz,  1981),  is  also  evident  in  the 
discrimination  of  full  gray-scale  textures.  Measurements  were  obtained  both  near  the  fovea  and 
in  the  peripheral  visual  field  in  order  to  estimate  a  CMF  for  this  discrimination. 


Modeling  Complex  Texture  Discrimination 

Texture  perception  research  has  typically  used  so-called  microtexture  patterns  composed 
of  fields  of  small,  relatively  simple  elements  and  only  a  few  gray  levels  (e.g.,  Beck,  1982; 
Julesz,  1981).  It  is  clearly  difficult  to  generalize  from  such  stimuli  to  natural  images,  which 
typically  comprise  multiple  gray-scales  and  often  display  a  characteristic  power  spectrum  (Field, 
1987).  It  is  likewise  difficult  to  generalize  the  results  of  many  of  the  current  models  of  texture 
perception  (cf.  Clark  &  Bovik,  1989;  Fogel  &  Sagi,  1989;  Graham,  Beck,  &  Sutter,  1992)  that 
are  based  on  mechanisms  inferred  from  neurophysiological  work  on  visual  receptive  fields. 
These  studies  have  used  relatively  simple  stimuli  and  sets  of  feature  analyzers  that  match  the 
textures  whose  discrimination  is  being  modeled.  While  many  of  the  results  have  justified  this 
approach,  texture  segregation  models  must  ultimately  predict  the  discrimination  of  more 
complex,  natural  textures  (Caelli,  1982).  One  appropriate  set  of  stimuli  for  testing  these  models 
could  be  generated  by  adding  together  localized  spectral  components  such  as  those  used  in  the 
present  study.  Such  stimuli  have  been  shown  to  approximate  natural  textures  (Porat  &  Zeevi, 
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1989),  and  they  would  be  conveniently  analyzabte  in  terau  of  a  physiologically  plausible  set  of 
feature  analyzers. 


It  has  beu»  recognized  (cf.  Badcock,  1984)  that  suprathreshold  stimuli  consisting  of  even 
a  few  sinusoidal  components  can  manifest  perceptually  salient,  local  luminance  "features"  uuu 
cannot  be  predicted  solely  from  the  response  of  simple  mechanisms  to  those  components. 
Although  tim  present  state  of  development  of  texture  models  is  not  sufficient  to  'predict  the 
perceptual  response  to  arbitrary  suprathreshold  stimuli,  there  may  be  agy***  of  the  perception 
of  such  stimuli  that  are  consistent  with  the  general  tenets  of  current  models  ami  which  uu) 
constrain  or  help  to  deHnratc  the  proper  form  of  future  models.  For  instance,  early  research 
(Beck,  1966,  1982;  Julesz,  1981)  found  that  the  orientation  of  the  pattern  elements  making  up 
a  texture  was  an  important  determinant  of  texture  segregation,  and  this  is  consistent  with  more 
recent  quantitative  models  based  on  oriented  detectors  (Clark  St  Bovik,  1989;  Fogel  Sc  Sagi, 
1989;  Graham,  Beck,  Sc  Sutter,  1992).  We  have  applied  the  texture  discrimination  model  of 
Sutter,  Beck,  and  Graham  (1989)  to  the  multicomponent  texture  stimuli  used  in  the  present 
study.  This  model  was  chosen  because  it  incorporates  many  of  the  features  of  the  other  models 
and  it  has  been  developed  in  the  context  of  experimental  tasks  and  texture  stimuli  most  similar 
to  those  used  in  the  present  study.  The  results  are  relevant  both  to  aw^Ming  the  generality  of 
the  model  and  to  establishing  additional  criteria  for  evaluating  this  and  other  texture  segregation 
models. 


METHODS 

Observers.  Three  observers,  two  females  and  one  male  (GG),  participated  in  the  present 
series  of  experiments.  Observers  LK  and  SP  were  in  their  mid-twenties,  were  unaware  of  the 
purpose  of  the  study,  and  were  compensated  for  their  participation.  Observer  GG  was  38  years 
old  and  was  one  of  the  authors.  All  observers  had  normal  uncorrected  vision. 

Apparatus.  Stimulus  generation,  data  collection,  and  data  analysis  were  under  the  control 
of  an  IBM  PC-AT  and  a  PCVision  video  board  (Imaging  Technology  Inc.,  see  Fig.  1).  Stimuli 
were  presented  on  Conrac  monitors  (Model  724 1C  19)  using  only  the  green  channel  (P22 
phosphor).  For  the  central  (0.75  deg)  condition,  the  two  textures  (see  below)  making  up  each 
stimulus  were  presented  on  the  same  monitor  and  were  viewed  at  a  distance  of  3  m  after 
reflection  by  two  front-surface  mirrors.  The  stimuli  were  presented  in  the  upper  half  of  the 
monitor  display  which  provided  a  total  illuminated  area  of  3.8  by  7.6  deg.  The  fixation  point 
was  a  small  black  dot  which  was  placed  directly  on  the  monitor  screen.  For  the  peripheral  (20 
deg)  condition,  the  same  pairs  of  textures  (suitably  scaled,  see  below)  were  presented  on  two 
separate  monitors  that  were  viewed  directly  from  a  distance  of  1.5  m.  A  large  cardboard  screen 
was  used  to  provide  a  homogeneous  surface  between  and  around  the  monitors.  Two  holes  were 
cut  in  the  screen  such  that  only  the  left  texture  of  the  stimulus  pair  on  the  left  monitor  and  the 
right  texture  of  the  stimulus  pair  on  the  right  monitor  were  visible.  The  resulting  visual 
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20  deg 


0.75  deg 


20  deg 


Figure  1 

A  Schematic  Diagram  of  the  Apparatus  Used  in  die  Present  Study 
Pairs  of  stimuli  were  presented  either  0.75°  or  20°  on  either  side  of  the  fixation  point.  Although 
both  pain  are  shown  here,  only  one  pair  of  stimuli  was  presented  in  a  given  experimental 
session.  Sizes  and  distances  are  described  in  the  text  and  are  not  shown  to  scale. 
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impression  was  that  the  two  halves  of  the  monitor  display  used  in  die  central  condition  had  been 
separated  along  the  horizontal  meridian.  A  green  LED  mounted  in  the  cardboard  screen  midway 
between  the  two  monitors  served  as  a  fixation  point. 

The  stimulus  display  monitors  were  calibrated  using  a  Spectra  Spotmeter  photometer 
(Photo  Research).  The  function  relating  image  pixel  value  to  monitor  luminance  was  linearized 
using  a  look-up  table  constructed  in  accordance  with  the  measured  response  characteristics  of 
the  monitors.  The  mean  luminance  of  the  texture  stimuli  was  14  fL,  and  the  experimental  room 
was  otherwise  dark.  The  observers  were  provided  with  a  chin  and  head  rest  and  were  asked  to 
enter  their  rating  response  on  a  computer  keyboard. 

Stimuli.  For  both  the  orientation-components  and  phase-bandwidth  stimulus  sets,  two- 
dimensional  textures  (see  Fig.  2a)  were  generated  off-line  using  a  program  which  added  together 
between  12  and  64  Gabor  functions  each  with  a  luminance  distribution,  L(x,y)  of  the  form: 


U.x,y)  -  exp{-n[[-^]2  +  [-^]2]>  •  cos  (o^  +  4>,  +  ♦  4>,), 


(1) 


where  D  is  the  effective  width  (±lo)  of  the  Gaussian  window,  wx  and  «y  are  the  spatial 
frequency  projections  in  the  horizontal  and  vertical  directions,  respectively,  and  </>x  and  4>y  are 
the  associated  phase  shifts.  The  spatial  frequencies  referred  to  in  this  report  correspond  to 


+ 


(2) 


The  component  spatial  frequencies  for  each  texture  were  distributed  logarithmically  between  1 
and  12  cycles/deg  (see  Fig.  2b).  This  was  accomplished  by  first  generating  a  series  of  numbers, 
Aj,  from  0  to  1,  according  to  the  formula,  \  =  (i-l)/(N-l),  where  N  is  the  number  of  spatial 
frequencies  in  the  set,  and  i  is  an  integer  between  1  and  N.  The  spatial  frequencies  for  the  set 
were  then  determined  by  raising  12  to  the  Ajth  power  for  each  of  the  N  A/s.  The  results  of  this 
procedure  for  N=8  are  shown  in  Table  1.  Various  numbers  (M)  of  orientations  were  associated 
with  each  of  the  spatial  frequencies  determined  as  described  above.  Specifically,  the  orientations 
were  linearly  spaced  between  0  deg  and  180-(180/M)  deg  (see  Fig.  2b). 

For  the  stimuli  used  to  investigate  the  effects  of  number  of  component  orientations,  each 
of  the  Gabor  components  making  up  a  particular  texture  had  the  same  effective  width  but  no  two 
had  both  the  same  number  of  spatial  frequencies  (#SFs)  and  the  same  number  of  orientations 
(#ORs).  Each  component  also  had  a  unique  phase.  The  level  of  phase  quantization  among 
components  depended  on  the  number  of  components  (n)  in  the  texture.  For  an  n-component 
texture,  the  levels  of  phase  quantization  used  corresponded  to  the  values  of  2x*m/n  radians 
where  m  is  an  integer  between  1  and  n.  In  order  to  minimize  the  effects  of  local  luminance 
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Figure  2a 

A  Topical  Texture  Set  Used  in  the  Orientation  Portion  of  the  Present  Study 
The  textures  in  this  set  are  composed  of  eight  spatial  frequencies  at  each  of  8,  7, 6,  5, 4  or  3  orientations. 
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Figure  2b 

The  Component  Distribution  for  Each  Member  of  the  Texture  Set  Shown  in  Figure  2a. 

The  component  phases  for  all  textures  were  randomized.  The  texture  at  the  upper  right  (8SFs/80Rs) 
is  one  of  the  textures  that  was  used  as  a  standard  stimulus  on  each  trial.  The  other  three  standard 
stimuli  were  obtained  by  randomly  redistributing  the  component  phases  of  the  8SF/80R  texture. 


Table  1.  Computation  of  the  Spatial  Frequency  Levels 
(in  cycles/Gaussian  halfwidth)  Used  in  a  Gabor 
Texture  Containing  N=8  Spatial  Frequencies 


i 

Aj 

o> 

1 

0.000 

1 

2 

0.143 

1.43 

3 

0.286 

2.04 

4 

0.429 

2.9 

5 

0.571 

4.13 

6 

0.714 

5.9 

7 

0.857 

8.41 

8 

1.000 

12 

variations  on  the  similarity  judgments,  four  different  phase  stimuli  were  generated  for  each 
#SFs-#ORs  combination.  The  four  stimuli  were  generated  by  randomly  reassigning  each  of  the 
phases,  obtained  as  described  above,  to  the  various  components  of  the  original  stimulus. 

For  the  stimuli  used  to  investigate  the  effects  of  phase-bandwidth,  textures  were  generated 
by  adding  together  either  24  or  64  Gabor  functions.  Four  groups  of  textures  were  produced, 
using  the  following  combinations  of  #SFs  and  #ORs,  respectively:  4  and  16,  4  and  6,  8  and  8, 
and  8  and  3.  Once  again,  the  phase  of  each  Gabor  component  relative  to  the  center  of  the 
texture  was  randomly  determined.  In  this  portion  of  the  study,  the  main  independent  variable 
was  phase-bandwidth,  defined  as  the  width  of  the  region  (in  phase  space)  over  which  the  phases 
of  the  component  Gabor  functions  were  distributed.  Shown  in  Figure  3  is  the  phase  distribution 
of  a  hypothetical  24-component  stimulus  set.  In  order  to  avoid  extensive  areas  of  luminance 
saturation  especially  near  the  center  of  the  resultant  image,  one-half  of  the  components  was 
distributed  about  phase =0  radians  and  the  other  half  was  distributed  about  phase =r  radians. 
The  phase  bandwidths  tested  were  determined  by  the  parameter  r  (see  Fig.  3)  which  was  set 
equal  to  0  (i.e.,  all  components  at  the  same  phase),  13/3,  13/4,  13/5,  and  13/6.  An  example 
of  a  phase  bandwidth  stimulus  set  (corresponding  to  8SFs/80Rs)  is  shown  in  Figure  4a.  The 
bandwidths  corresponding  to  this  and  all  other  sets  used  in  the  present  study  correspond  to  2x/r 
and  hence  equal  0,  6r/13,  8t/13,  10x/13,  and  12x/13  radians.  The  textures  shown  in  Figure 
4a  were  obtained  by  distributing  components  in  phase  space  in  the  manner  shown  in  Figure  4b. 

The  stimuli  presented  at  0.75°  eccentricity  were  0.75°  in  diameter  (at  ±lo)  and  the 
stimuli  presented  at  20°  eccentricity  were  either  1.5°  or  3.0°  in  diameter  (i.e.,  CMFs  of  2  and 
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Figure  3 

The  Distribution  in  Phase  Space  of  the  Components 
of  a  Hypothetical  24-Component  Texture 

One-half  of  the  components  were  distributed  around  zero  phase  and  one-half  about  t 
phase  in  order  to  avoid  loss  of  texture  structure  due  to  the  luminance  saturation  that 
occurred  near  the  center  of  the  image  where  the  Gaussian  envelope  pealed  for  all 
components.  Die  parameter,  r,  determines  the  phase-bandwidth  and  is  inversely 
proportional  to  it. 
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Figure  4a 

A  Typical  Texture  Set  Used  in  the  Phase-Bandwidth  Portion  of  the  Present  Study 
A  wider  distribution  in  phase  space  is  asociates  with  a  greater  spatial 
disruption  of  the  original  texture  pattern  shown  at  the  upper  left. 


Figure  4b 

The  Phase  Bandwidth  Associated  with  Each  Member  of  the  Texture  Set  of  Figure  4a. 
Each  of  the  textures  shown  is  composed  of  64  components  (8SFs  and  80Rs) 
evenly  distributed  in  phase  space  over  the  areas  shown  in  the  graphs. 


4).  The  CMF  of  4  was  the  largest  we  could  test  given  the  size  of  the  display  monitor  and  our 
desire  to  minimize  observer  accommodation  by  maintaining  a  viewing  distance  of  at  least  1.5m. 

When  modulating  a  sinusoidal  luminance  distribution  by  a  Gaussian,  a  shift  in  the  mean 
luminance  results  whenever  the  sinusoid  is  not  antisymmetrical  about  the  center  of  the  Gaussian 
(i.e.,  for  all  sinusoids  except  a  sine  wave  with  phase  equal  to  an  integer  multiple  of  r  radians). 
Since  the  textured  patterns  extended  over  only  a  portion  of  the  display,  there  would  be  a 
mismatch  in  the  mean  luminances  of  the  textures  and  the  background  if  image  contrast  were 
maximized  using  the  most  straightforward  luminance  scaling  procedure  of  assigning  the 
minimum  (maximum)  display  luminance  value  to  the  minimum  (maximum)  pixel  values. 
Therefore,  to  assure  that  the  mean  luminance  of  the  pattern  would  match  that  of  the  background, 
the  luminance  range  of  each  multicomponent  texture  was  scaled  as  follows.  Either  the  maximum 
pixel  value  in  the  texture  was  set  equal  to  the  maximum  luminance  (i.e.,  twice  the  mean 
luminance),  or  the  minimum  value  was  set  equal  to  the  minimum  luminance,  depending  on 
whether  the  minimum  (most  negative)  or  maximum  (most  positive)  pixel  value  was  farther  from 
zero.  Although  the  Gaussian  envelope  theoretically  extends  to  infinity  in  all  directions,  in 
practice  it  was  limited  to  ±2 a  with  all  pixels  beyond  this  limit  set  equal  to  the  mean  luminance. 

Procedure.  The  observers  first  adapted  for  about  five  minutes  to  the  ambient  illumination 
in  the  experimental  room.  The  last  minute  of  adaptation  was  spen'  viewing  a  homogeneous  field 
corresponding  to  the  mean  luminance  of  the  stimulus  textures  to  be  presented.  The  observers 
initiated  the  session  using  their  keyboard,  and  the  first  texture  pair  was  presented  for  167  ms. 
The  observer  then  rated  the  similarity  of  the  members  of  the  pair  by  pressing  the  appropriate 
numeric  key  on  the  keyboard.  The  next  pair  was  presented  2  s  after  the  observer’s  response, 
and  this  cycle  continued  for  the  remainder  of  the  session.  In  the  component-orientation  portion 
of  the  study,  18  combinations  of  #SFs  and  #ORs  were  tested  in  each  session.  Each  of  the  4 
phase-versions  corresponding  to  each  #SF-#OR  category  was  paired  with  each  of  the  4  phase- 
versions  in  the  #SFs=8/#ORs=8  category-the  latter  thus  serving  as  stimuli.  Each  of  the 
resulting  pairs  was  presented  8  times  (4  with  one  of  the  standards  on  the  right  and  4  with  it  on 
the  left)  in  each  experimental  session,  resulting  in  a  total  of  18x4x8  =  576  trials  randomized 
with  respect  to  all  variables  mentioned  above.  The  three  combinations  of  eccentricity  and  test 
stimulus  diameter  (referred  to  here  as  the  eccentricity-size  factor)  were  tested  in  a  different  order 
for  each  observer.  The  testing  procedures  were  similar  in  the  phase-bandwidth  portion  of  the 
study,  except  that  there  was  a  single  standard  stimulus  corresponding  to  each  of  the  four  stimulus 
sets  described  earlier.  The  phase-bandwidth  of  the  standard  stimuli  was  zero  indicating  that  half 
of  the  components  had  a  phase  of  zero  and  half  had  a  phase  of  t  radians. 

For  both  the  component-orientations  and  phase-bandwidth  portions  of  the  study,  the 
observers  were  asked  to  rate,  on  a  scale  from  1  (most  dissimilar)  to  7  (most  similar),  the 
perceived  similarity  of  the  two  textures  making  up  each  stimulus.  At  least  two  practice  sessions 
were  run  to  acquaint  the  observers  with  the  full  range  of  possible  differences  between  the  pairs 
of  textures.  As  noted  above,  in  the  component-orientation  portion  of  the  study,  there  were  four 
phase  versions  of  each  of  the  standard  and  test  stimuli.  Similarity  ratings  were  therefore 
obtained  between  each  phase  version  of  each  stimulus  and  each  phase  version  of  the  standard 
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(i.e.,  the  stimulus  for  which  #SFs  =  8  and  /K)Rs  *  8).  Thus,  a  high  similarity  rating  does  not 
necessarily  mean  perceptual  equivalence  since  the  baseline  for  each  observer’s  ratings  is  the 
perceived  difference  not  among  physically  identical  stimuli  but  rather  among  stimuli  that  differed 
in  the  distribution  of  phase  among  each  of  their  components. 

Data  Analysis.  For  the  component-orientations  portion  of  the  study,  data  from  the 
0.75°/0.75°  condition  were  compared  with  data  from  each  of  the  other  two  conditions  (20°/ 1 .5° 
and  2073.0°)  in  two  separate  analyses  of  variance  (ANOVAs).  Each  of  the  ANOVAs  included 
three  factors  (eccentricity  size,  #SFs,  and  #ORs),  and  were  performed  using  a  randomized  block 
design  with  observers  as  the  block.  Only  some  levels  of  the  #ORs  factor  were  included  (i.e., 
those  that  were  assessed  at  more  than  one  level  of  the  #SFs  factor.)  Even  so,  there  was  a  small 
number  of  missing  ceils  because  these  two  factors  were  not  completely  crossed.  All  observer 
interactions  were  pooled  into  a  single  error  term  which  was  used  to  test  the  significance  of  all 
main  effects  and  interactions. 

For  the  phase-bandwidth  portion  of  the  study,  data  from  the  0.7570.75°  condition  were 
again  compared  with  data  from  each  of  the  other  two  conditions  (20°/ 1.5°  and  2073.0°)  in 
separate  ANOVAs.  Each  of  these  ANOVAs  included  four,  completely  crossed  factors  (phase 
bandwidth,  eccentricity  size,  number  of  components,  and  observer).  Each  of  the  data  sets 
analyzed  in  the  two  ANOVAs  described  above  was  further  analyzed  by  four  additional 
ANOVAs,  one  for  each  of  the  two  levels  of  the  number-of-components  factor  for  each  of  the 
two  observers. 


RESULTS 

Since  our  initial  concern  was  the  relationship  between  rated  similarity  and  number  of 
components,  our  initial  analysis  was  performed  on  these  two  variables.  Examples  of  these  data 
from  one  observer  and  one  eccentricity-size  condition  are  shown  in  Figure  5.  The  data  show 
that  the  general  shape  of  the  rating  functions  is  the  same  for  each  number  of  spatial  frequencies, 
and  that  there  appears  to  be  a  systematic  shift  along  the  horizontal  axis  as  a  function  of  the 
number  of  spatial  frequencies  in  the  stimulus  texture.  This  suggested  that  number-of-orientations 
was  the  appropriate  variable  on  which  to  perform  subsequent  analyses. 


Number  of  Orientations 

Shown  in  Figure  6  are  mean  similarity  ratings,  plotted  as  a  function  of  the  number  of 
orientations  (#ORs)  in  the  test  stimulus,  for  all  three  observers  and  all  three  eccentricity-size 
conditions.  The  three  sets  of  points  within  each  of  the  nine  panels  were  obtained  using  test 
stimuli  containing  either  4,  6,  or  8  spatial  frequencies.  As  can  be  seen  from  the  figure,  there 
is  considerable  overlap  among  the  three  sets  of  points,  indicating  that  the  number  of  spatial 
frequencies  (#SFs),  and  hence  the  total  number  of  components  in  the  texture  stimulus  has 
relatively  little  effect  on  perceived  similarity.  This  observation  is  supported  by  a  nonsignificant 
#SFs  main  effect  and  a  nonsignificant  #SFs  x  #ORs  interaction  in  each  of  the  two  ANOVAs. 
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Figure  6 

Similarity  Ratings  for  Each  of  the  Three  Observers  as  a  Function 
of  the  Number  of  Orientations  (#ORs)  Making  Up  the  Stimulus  Textures 
Each  panel  shows  a  data  set  replotted  from  one  similar  to  that  of  Figure  5.  The  top  panel  for 
each  observer  shows  data  obtained  for  a  texture  pair  (i.e.,  0.75°  standard  and  test)  presented  at 
0.75°  on  either  side  of  the  fixation  point.  The  middle  panels  show  the  data  obtained  for  textures 
composed  of  the  same  components  but  doubled  in  size  to  1 .5°  (a  CMF  of  2)  and  presented  at  20° 
eccentricity.  The  bottom  panels  show  the  data  obtained  using  3.0°  textures  (a  CMF  of  4)  again 
presented  at  20°. 
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As  is  evident  from  the  data  of  Figure  6,  rated  similarity  initially  increased  as  the  #ORs 
in  the  test  stimuli  were  increased,  and  then  remained  approximately  constant  as  the  #ORs 
approached  (and  exceeded)  the  #ORs  (i.e.,  8)  in  the  standard  stimuli.  This  observation  is 
reflected  in  the  significant  #ORs  main  effect  in  both  the  ANOVA  comparing  the  0.7570.75°  and 
20*/ 1 . 5°  conditions  (F6  67:=»142,p<  104)  and  the  ANOVA  comparing  the  0.7570. 75°  and  2073.0° 
conditions  (F^w=  138,  p  <  104).  In  order  to  estimate  the  critical  #ORs  (i.e. ,  the  #ORs  at  which 
rated  similarity  no  longer  changes  as  the  #ORs  are  increased),  we  have  fitted  two  straight  lines 
to  the  data  in  each  panel  of  Figure  5  using  a  least-squares  criterion.  A  horizontal  line  was  fitted 
to  the  rating  data  obtained  for  8  and  10  orientations,  and  a  second  straight  line  was  fitted  to  the 
rating  data  obtained  for  3,  4,  and  5  orientations.  The  intersection  of  the  two  regression  lines 
in  each  panel  was  used  to  estimate  the  critical  #ORs-that  is,  the  #ORs  above  which  the  test 
textures  were  perceived  to  be  as  similar  to  the  standard  textures  as  the  various  phase  versions 
of  the  standard  textures  were  to  themselves.  Estimates  of  the  critical  #ORs  are  indicated  by  the 
arrows  placed  along  the  horizontal  axis  in  each  of  the  panels  of  Figure  6. 

For  all  observers,  the  #ORs  at  asymptote  are  approximately  the  same  for  the  0.7570.75° 
and  20°/ 1.5®  conditions,  which  is  consistent  with  the  fact  that  the  eccentricity  size  x  #ORs 
interaction  did  not  even  approach  significance  in  the  ANOVA  comparing  these  two  conditions 
.26,  p— 0.95).  Further,  there  was  no  main  effect  of  eccentricity  size.  Thus,  doubling 
the  size  of  the  peripheral  texture  stimuli  was  sufficient  to  produce  ratings  similar  to  those 
obtained  for  the  smaller  stimuli  near  the  fovea.  In  contrast,  the  #ORs  at  asymptote  for  the 
2073.0“  condition  are  greater  than  for  the  0.7570.75°  condition,  and  this  is  reflected  in  the 
significant  eccentricity  size  x  #ORs  interaction  in  the  ANOVA  comparing  these  conditions 
(Ft.ft~4.72,  .0004). 

For  purposes  of  comparison,  similarity  rating  was  also  plotted  as  a  function  of  the 
number  of  spatial  frequencies  (#SFs)  in  the  test  stimulus,  for  two  of  the  original  three  observers 
and  all  three  eccentricity-size  conditions.  The  results  are  shown  in  Figure  7  where  it  can  be 
seen  that  the  plot  for  #ORs=4  is  generally  flatter  than  the  analogous  plots  for  #ORs=6  and  8. 
This  reinforces  the  conclusion,  drawn  from  the  same  data  plotted  in  Figure  6,  that  6  and  8 
orientations  are  generally  at  or  above  the  estimated  critical  #ORs  while  4  is  below  it. 


Phase-Bandwidth 

Mean  similarity  ratings  as  a  function  of  phase-bandwidth,  obtained  from  observers  GG 
and  LK,  are  shown  in  Figures  8a  and  8b,  respectively.  The  four  panels  in  each  figure 
correspond  to  the  four  combinations  of  #SFs/#ORs  tested.  For  the  two  ANOVAs  (one 
comparing  the  foveal  condition  with  each  of  the  peripheral  conditions),  all  main  effects  and 
interactions  were  significant  (p<  104)  except  for  the  observer  x  bandwidth  and  number-of- 
components  x  eccentricity- size  interactions.  Both  ANOVAs  indicated  a  significant  decrease  in 
rated  similarity  as  phase-bandwidth  was  increased  (F4(7634=1087,  p<  IQ4;  F4  7000=893,p<  104). 
The  decrease  was  greater  for  the  foveal  condition  than  for  either  of  the  peripheral  conditions, 
as  indicated  by  significant  bandwidth  x  eccentricity-size  interactions  (F4  ,7634  74.2,  p <  104; 
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Figure  7 

Similarity  Ratings  for  Two  Observers  as  a  Function  of  the 
Number  of  Spatial  Frequencies  (#SFs)  Making  Up  the  Stimulus  Textures 
Hie  top  panel  for  each  observer  shows  data  obtained  for  a  texture  pair  (i.e.,  a  0.75°  standard 
and  a  o.75°  test)  presented  at  0.75°  on  either  side  of  the  fixation  point.  The  middle  panels  show 
the  date  obtained  for  textures  composed  of  the  same  components  but  doubled  in  size  1.5°  (a 
CMF  of  2)  and  presented  at  2CP  eccentricity.  The  bottom  panels  show  the  data  obtained  using 
3.0°  textures  (a  CMF  of  4)  again  presented  at  20P. 
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Figure  8a 

Similarity  Ratings  as  a  Function  of  Phase-Bandwidth  Data  for  Observer  GG 
For  0.75°  stimuli  presented  at  0.75°  eccentricity  (filled  circles)  and  for  either  1 .5°  or  3.0°  stimuli 
(representing  CMFs  of  2  and  4,  respectively)  presented  at  20°  eccentricity.  Data  for  observer 
GG  obtained  using  either  64  (left  two  panels)  or  24  total  components  and  an  orientation  density 
(#ORs/  ^components)  of  either  one-quarter  (upper  two  panels)  or  one-eighth. 
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Figure  8b 

Similarity  Ratings  as  a  Function  of  Phase-Bandwidth  Data  for  Observer  LK 
For  0.75°  stimuli  presented  at  0.75°  eccentricity  (filled  circles)  and  for  either  1.5°  or  3.0°  stimuli 
(representing  CMFs  of  2  and  4,  respectively)  presented  at  20°  eccentricity.  Data  for  observer 
LK  obtained  using  either  64  (left  two  panels)  or  24  total  components  and  an  orientation  density 
(#ORs/  #components)  of  either  one-quarter  (upper  two  panels)  or  one-eighth. 
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Foooo-115,  p<  104).  The  significant  bandwidth  x  eccentricity-size  x  number-of-components 
interactions  0F4,Wj4*25.2,  p<  104;  F47000=3.49,  p< 0.008)  further  indicate  that  the  difference 
between  the  size  of  the  bandwidth  effects  in  the  foveal  and  peripheral  conditions  varied  with 
number-of-components .  Of  the  four  additional  ANOVAs,  performed  in  order  to  determine 
which  level  (24  or  64)  of  the  number-of-components  factor  resulted  in  the  larger  bandwidth  x 
eccentricity-size  interaction,  the  relevant  F-ratios  of  three  (all  except  number-of-components  = 
24  for  observer  GG)  showed  that  the  interaction  was  larger  for  the  smaller  number  of 
components. 


DISCUSSION 

Complex  Imagery  firm  Spectral  Components 

Human  spatial  vision  is  presumed  to  be  mediated  by  overlapping  mechanisms,  each  with 
a  bandwidth  of  one  to  two  octaves  (cf.  Graham,  1989).  This  general  view  has  been  derived 
from  threshold  data,  and  even  for  those  data  it  may  not  predict  responses  to  complex  stimuli 
composed  of  as  few  as  two  or  three  superimposed  sinusoids  (Bad cock,  1984;  Lawton,  1984). 
Although  the  present  state  of  development  of  the  mechanism  doctrine  may  not  be  sufficient  to 
predict  the  perceptual  response  to  complex,  suprathreshold  stimuli,  there  may  be  aspects  of  the 
perception  of  such  stimuli  which  are  consistent  with  the  predictions  of  that  doctrine  (or  which 
may  at  least  constrain  or  help  to  delineate  the  situations  where  it  may  effectively  be  applied). 
One  of  the  questions  that  motivated  the  present  research  was  whether  there  was,  for  a  complex 
image,  a  minimum  number  of  components  which  would  produce  an  adequate  perceptual 
representation  of  it.  The  answer  to  this  question  depends  on  the  criteria  chosen  both  to  define 
image  complexity  and  to  determine  what  is  an  adequate  perceptual  representation.  In  the  context 
of  the  postulated  mechanisms  of  spatial  vision,  a  complex  stimulus  could  contain  as  few  as  two 
sinusoidal  components,  whereas  thousands  of  such  components  are  required  to  represent  the 
complexity  encountered  in  natural  images.  Once  an  appropriate  level  of  image  complexity  is 
chosen,  the  most  direct  way  to  determine  the  minimum  number  of  components  needed  to 
adequately  represent  the  image  is  to  first  reconstruct  it  using  successively  more  components  and 
then  to  compare  these  partial  reconstructions  to  the  original. 

Given  a  fixed  number  of  components  with  which  to  construct  an  image,  how  is  the  best 
way  to  distribute  them?  In  general  it  might  be  supposed  that  components  which  are  very  similar 
in  spatial  frequency  and/or  orientation  would  interact  so  that  the  perceptual  difference  between 
two  such  components  and  one  within  the  same  region  of  the  visual  field  would  be  smaller  than 
it  would  be  if  the  two  components  were  separated  more  widely  in  SF  and/or  OR.  For  both  of 
the  present  studies,  our  major  concern  is  with  what  may  be  considered  emergent  properties  of 
texture  discrimination— specifically ,  the  perceptual  similarity  of  gratings  composed  of  different 
numbers  of  components,  and  the  perceptual  disruption  of  image  structure  as  phase-bandwidth 
is  increased.  Although  it  may  not  be  possible  at  present  to  establish  how  such  emergent 
properties  are  related  to  the  overall  pattern  of  activity  in  the  presumed  set  of  mechanisms,  it  is 
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certainly  useful  to  determine  whether  they  change  in  a  predictable  way  with  known  changes  in 
mechanism  properties  which  occur,  for  instance,  across  die  visual  field. 


Salience  of  Component  Orientation 

The  data  of  Figure  5  show,  as  might  be  expected,  that  rated  similarity  increases  as  the 
number  of  components  making  up  the  test  texture  approaches  the  number  of  components  in  the 
standard  texture.  More  surprising  perhaps  is  that  the  rate  of  increase  in  perceived  similarity  is 
largely  independent  of  the  #SFs  in  the  texture  series.  This  is  true  despite  the  fact  that  the  three 
texture  series,  which  produced  the  individual  functions  of  Figure  5,  are  easily  distinguishable, 
and  that  all  textures  were  rated  relative  to  the  same  (phase)  set  of  80R/8SF  standard  textures. 
The  similarities  in  the  form  of  the  functions  of  Figure  5  become  more  obvious  when  rated 
similarity  is  plotted  as  a  function  of  the  number  of  orientations  in  the  test  texture.  Data  plotted 
in  this  form  are  shown  in  the  upper  row  of  Figure  6  for  each  of  the  three  observers  at  a  retinal 
eccentricity  of  0.75°.  Plotting  the  data  in  this  way  is  justified  by  the  ANOVA  results  described 
earlier,  and  clearly  shows  that  the  #ORs  in  the  test  texture  are  the  primary  determinant  of 
perceived  similarity. 

The  orientation  of  pattern  elements  is  known  to  be  an  important  determinant  of  both 
threshold  and  suprathreshold  form  perception.  For  instance,  differences  in  the  orientation  of 
texture  micropattems  can  underlie  both  perceptual  grouping  (Beck,  1966)  and  preattentive  texture 
segregation  (Beck,  1982;  Julesz,  1981;  Nothdurft,  1985).  The  visual  salience  of  pattern 
orientation  is  reflected  also  in  the  orientation  columns  in  the  visual  cortex  (Carpenter  & 
Blakemore,  1973;  Hubei  &  Wiesel,  1962).  The  orientation  columns  are  in  turn  one 
manifestation  of  presumed  visual  channels  (Hubei  &  Wiesel,  1968)  that  have  a  two-dimensional 
spatial  (and  spectral)  structure  with  an  inherent  orientation.  The  data  of  Figure  6  suggest, 
however,  that  orientation  and  spatial  frequency  may  not  be  perceptually  equivalent.  Specifically, 
whereas  the  data  indicate  that  component  orientation  is  a  particularly  salient  determinant  of 
texture  appearance,  the  fact  that  ratings  are  lower  for  textures  composed  of  fewer  orientations 
but  not  for  those  composed  of  fewer  spatial  frequencies  suggests  that  the  same  perceptual 
importance  is  not  attached  to  component  spatial  frequency. 

The  above-described  salience  of  component  orientation  is  a  higher-order  perceptual 
property  that  is  not  readily  predictable  from  the  responses  of  the  visual  processes  that  are 
presumed  to  underlie  threshold  spatial  vision.  In  general,  modeling  higher-order  perception 
using  these  visual  processes  has  been  successful  only  for  selected  stimuli  (e.g.,  Bergen  &  Landy, 
1991;  Clark  &  Bovik,  1989;  Graham,  Beck  &  Sutter,  1992;  Turner,  1986).  Caelli  and  Bevan 
(1983)  obtained  similarity  ratings  for  the  comparison  of  broadband-filtered  stochastic  textures 
with  their  orientation-filtered  counterparts.  For  the  spatial  frequency  range  below  8  c/deg, 
which  is  similar  to  that  used  in  the  present  study,  their  results  were  consistent  with  threshold 
data  obtained  using  simpler  stimuli.  The  spectral  components  making  up  the  stimuli  used  in  the 
present  study  (see  Fig.  2)  correspond  in  some  ways  to  visual  weighting  functions  (cf.  Jones  & 
Palmer,  1987)  and  represent  an  extension  of  the  stimulus  set  used  by  Caelli  and  Bevan  in  that 
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orientation  is  explicitly  defined  in  the  spatial  domain.  Thus,  it  might  be  expected  that  even 
similarity  ratings  obtained  for  the  present  stimuli  would  reflect  in  some  aspect  the  response 
properties  of  basic  visual  mechanisms.  Some  evidence  that  this  is  the  case  can  be  seen  in  the 
critical  #ORs  estimated  from  the  intersection  of  the  functions  fitted  to  the  data  in  the  upper  row 
of  Figure  6.  The  critical  #ORs  are  between  5  and  7  and  are  therefore  consistent  with  threshold 
data,  which  suggest  that  there  are  5-8  orientation  channels  with  bandwidths  of  15-30  deg 
(Dannemiller  &  Ver  Hoeve,  1990;  Daugman,  1984;  Phillips  and  Wilson,  1984).  Correlations 
of  this  sort  have  also  been  noted  in  the  context  of  other  suprathreshold  texture  discriminations 
(cf.,  Caelli,  1982;  Richards  &  Polit,  1974).  Thus,  we  conclude  from  the  data  of  Figure  6  that 
complex,  suprathreshold  textures  can  be  generated  which  are  preattentively  indistinguishable 
from  textures  containing  many  more  components,  provided  that  the  former  contain  at  least  5-7 
orientations.  These  results  may  also  be  applicable  to  real-world  imagery  in  that  spectral  textures 
similar  to  those  used  here  can  be  chosen  to  visually  match  many  natural  textures  (Porat  &  Zeevi, 
1989)  as  well  as  textures  derived  from  other  continuous  spectra  (Kronauer,  Daugman  &  Zeevi, 
1982). 


CMF  for  Suprathreshold  Textures 

Orientation-Components  Stimuli.  A  CMF  for  the  discrimination  of  suprathreshold,  full 
gray-scale  textures  has  not  to  our  knowledge  been  previously  estimated.  The  importance  of 
component  orientation  in  determining  the  similarity  rating  of  these  textures  suggests,  however, 
that  their  CMF  may  correspond  to  those  obtained  for  other  orientation-related,  preattentive 
discriminations.  Estimates  for  the  latter  are  available  from  several  sources.  Scobey  (1982) 
found  that  the  discrimination  of  line  orientation  could  be  equated  at  the  fovea  and  the  retinal 
periphery  if  line  length  was  scaled  in  accordance  with  the  si ze  of  cortical  receptive  fields  as 
reported  by  Hubei  and  Wiesel  (1974).  He  determined  that  a  CMF  of  about  3.3  was  sufficient 
to  equate  discrimination  at  0°  and  10°  eccentricity.  Virsu  et  al.  (1987)  have  noted  that  Hubei 
and  Wiesel’s  data  are  consistent  with  the  psychophysical  data  of  Rovamo  and  Virsu  (1979)  and 
hence  would  predict  that  a  CMF  of  six  would  equate  Scobey’s  data  at  0.75°  and  20°  eccentricity. 
Also,  Paradiso  and  Carney  (1988)  concluded  that  their  orientation  discrimination  data  could  be 
adequately  scaled  using  the  CMF  estimated  by  Levi  et  al.  (1985)  from  the  data  of  Dow,  Snyder, 
Vautin  and  Bauer  (1981).  Those  data  give  a  relative  CMF  of  about  14  between  0.75°  and  20° 
eccentricity.  Nothdurft  (1985)  measured  both  orientation  sensitivity  and  the  discrimination  of 
textures  composed  of  lines  whose  orientation  differed.  Although  he  concluded  that  the  two  tasks 
are  mediated  by  functionally  distinct  mechanisms,  his  data  indicate  that  performance  on  both 
tasks  can  be  equated  at  0.75°  and  20°  eccentricity  if  the  more  peripheral  stimuli  are  scaled  by 
a  factor  of  about  7.  Virsu  et  al.  (1987)  determined  the  orientation  threshold  for  a  two-dot 
vernier  discrimination,  and  again,  the  data  were  found  to  be  consistent  with  the  CMFs  estimated 
by  Rovamo  and  Virsu  (1979).  Finally,  Geri  and  Lyon  (1991)  estimated  a  CMF  of  6  for  shape 
adaptation  to  closed  contour  stimuli  whose  orientation  was  varied  systematically.  Thus,  the 
available  data  on  orientation-related  discrimination  predict  a  CMF  of  at  least  6  between  0.75° 
and  20°  eccentricity. 
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The  data  in  the  third  row  of  Figure  6  were  obtained  at  2(f  eccentricity  using  a  CMF  of 
4  relative  to  that  used  to  obtain  the  data  in  the  first  row  of  that  figure  (corresponding  to  0.75° 
eccentricity).  Even  this  relatively  low  CMF  resulted  in  an  overcorrection  of  the  data,  in  that 
both  rated  similarity  and  the  critical  #ORs  (indicated  by  the  arrows  in  Fig.  6)  were  significantly 
higher  at  20°  eccentricity.  In  an  attempt  to  better  estimate  a  CMF  for  our  textures,  the  data  in 
the  second  row  of  Figure  6  were  then  obtained  (also  at  20°  eccentricity)  using  a  CMF  of  2.  This 
CMF  appears  to  adequately  reproduce  the  ratings  obtained  at  0.75°  eccentricity  in  that  both  the 
maximum  rating  and  the  derived  critical  #ORs  are  now  very  similar.  Thus,  the  CMF  estimated 
here  for  the  similarity  rating  of  complex  textures  is  significantly  lower  than  those  estimated  for 
simpler  orientation-related  discriminations. 

The  available  data  suggest  that  the  visual  cortex  is  organized  hierarchically  with  higher 
levels  performing  successively  more  complex  visual  analyses  (cf. ,  van  Essen  &  Maun  sell  *  °83). 
Complex,  spatially  separated,  supra  threshold  textures  and  a  similarity  rating  task  were  used  in 
the  present  study  and  thus  the  data  of  Figure  6  probably  reflect  a  relatively  high  level  of  visual 
processing  (cf.  Lamme,  Van  Dijk  &  Spekreijse,  1992).  Gattass  et  al.  (1985)  have  summarized 
neurophysiological  estimates,  which  indicate  that  the  CMF  is  generally  lower  for  higher  visual- 
cortical  levels.  Although  there  appear  to  be  differences  also  in  the  slopes  of  the  CMF  vs. 
eccentricity  functions,  the  variability  in  the  data  as  well  as  the  problems  inherent  in  estimating 
CMFs  near  the  fovea  make  it  difficult  to  conclude  that  there  is  a  difference  in  the  relative  CMF 
of  the  various  cortical  levels  between  0.75  and  20  eccentricity.  It  is  well  known,  however,  that 
extensive  reciprocal  connections  exist  among  the  various  cortical  areas,  and  it  appears  that  visual 
information  is  progressively  integrated  at  higher  levels  (cf.,  Van  Essen  &  Maunsell,  1983).  This 
integrative  property  of  the  visual  cortex  might  be  expected  to  average  the  CMFs  of  contributing 
cortical  areas  and  thus  reduce  the  CMF  of  higher  levels.  Further,  visual  receptive  field  size 
increases  with  eccentricity  at  a  greater  rate  for  higher  cortical  areas  (Gattass  et  al.,  1985). 
These  observations  are  consistent  both  with  the  proposition  that  information  is  progressively 
integrated  at  higher  cortical  levels  and  with  the  present  data  which  indicate  a  relatively  low  CMF 
for  the  suprathreshold  discrimination  of  complex  textures. 

Phase-Bandwidth  Stimuli.  Increasing  the  phase-bandwidth  of  two-dimensional, 
multicomponent  Gabor  textures,  such  as  the  one  shown  at  the  upper  left  in  Figure  4,  results  in 
a  disruption  of  their  spatial  structure.  As  is  evident  from  our  analysis  of  the  data  of  Figure  8 
(i.e.,  the  significant  bandwidth  x  eccentricity-size  interactions),  this  disruption  is  more 
consistently  discriminated  at  0.75°  than  at  20°  even  when  the  more  peripheral  data  are  magnified 
by  the  same  factor  of  four  that  overcorrected  the  component-orientation  data  of  Figure  6.  This 
difference  is  to  be  expected  if  positional  acuities  are  required  to  discriminate  the  structural 
differences  associated  with  changes  in  phase-bandwidth. 

Hofmann  and  Hallett  (1993)  note  that  for  simple,  regular  patterns,  orientation  and  phase 
are  effectively  local  parameters  since  the  changes  they  induce  are  easily  identifiable  with  features 
defined  by  local  differences  in  luminance.  If  such  features  are  mediating  discrimination,  they 
should  scale  as  luminance  or  contrast  and  hence  be  equatable  in  the  center  and  periphery. 
However,  in  more  complex  textures  such  as  those  used  in  the  present  study,  the  size  of  the 
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luminance  features  may  be  a  limiting  factor  in  discrimination.  Changes  in  phase-bandwidth  tend 
to  result  in  local  feature  changes  that  are  on  a  smaller  scale  than  those  associated  with  changing 
the  number  of  oriented  components  in  a  texture.  Since  spatial  features  are  known  to  be  more 
difficult  to  discriminate  in  the  periphery,  the  differences  in  the  CMF  for  the  #OR  and  phase 
bandwidth  stimuli  may  simply  be  due  to  a  relative  undersampling  of  the  present  textures  in  the 
peripheral  visual  field  (cf.,  Hess  &  Pointer,  1987). 

It  has  been  suggested  (Hofmann  &  Hallett,  1993;  Klein  &  Tyler,  1986)  that  textures 
whose  components  differ  in  phase  only  may  be  more  difficult  to  discriminate  than  textures  whose 
components  differ  in  orientation.  While  both  the  numbers-of-orientation  and  phase-bandwidth 
stimuli  of  the  present  study  were  matched  for  mean  luminance,  the  effects  of  visual  system 
nonlinearities,  as  discussed  for  example  by  Bennett  and  Banks  (1991),  could  result  in  contrast 
differences  between  the  standard  and  comparison  images.  However,  the  phase-bandwidth 
textures  were  in  no  case  minor  images  of  each  other  and  so,  such  nonlinearities  cannot  explain 
differences  in  the  discrimination  of  the  two  types  of  stimuli  used  here. 

The  largest  CMF  we  were  able  to  test  here  is  below  those  predicted  by  the  discrimination 
data  described  above,  and  so  we  cannot  conclude  that  data  like  those  of  Figure  8  would  not  be 
adequately  corrected  by  the  same  CMFs  used  to  successfully  correct  data  obtained  from  other 
positional  tasks.  The  differences  in  the  CMFs  indicated  by  the  data  of  Figures  6  and  8  do 
suggest,  however,  that  the  low  CMF  associated  with  the  present  suprathreshold  texture 
discrimination  (Fig.  6)  is  not  shared  by  a  suprathreshold  task  presumably  dependent  on  positional 
cues  (Fig.  8).  Since  this  is  true  of  the  present  stimuli,  which  are  complex  textures,  it  may  also 
be  true  of  phase  discriminations  in  real-world  images  (Burton  &  Moorhead,  1981;  Oppenheim 
&  Iim,  1981). 


Relevance  to  Simple  Models  of  Texture  Segregation 

Early  theories  of  texture  segregation  (e.g.,  Beck,  1982;  Julesz,  1981)  considered  the 
effects  of  various  properties  (i.e.,  orientation,  closure,  terminations,  statistical  order)  of 
individual  texture  elements.  Some  recent  models  (e.g.,  Bergen  &  Landy,  1991;  Clark  &  Bovik, 
1989;  Graham,  Beck  &  Sutter,  1992;  Turner,  1986)  have  attempted  to  account  for  the  visual 
segregation  of  certain  simple  textures  using  local  feature  analyzers  in  the  form  of  elementary 
functions  (including  Gabor  functions)  presumed  to  characterize  visual  receptive  fields.  Texture 
discrimination,  however,  is  more  than  just  texture  segregation,  and  real  textures  are  more 
complex  than  those  to  which  models  are  typically  applied.  We  have  generated  textures  that  are 
complex  but  nevertheless  quantifiable  in  terms  of  their  Gabor  components.  They  may,  therefore, 
be  particularly  suitable  for  testing  models  of  texture  discrimination. 

As  an  example  of  how  the  present  data  might  be  used  to  assess  current  texture 
segregation  theories,  consider  the  models  of  Bergen  and  Landy  (1991)  and  Graham  et  al.  (1992). 
These  models  incorporate  component  channels  with  a  total  of  only  3-4  different  orientations.  The 
most  salient  feature  of  the  data  of  Figure  6  is  that  the  number  of  orientations  at  which 
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discrimination  asymptotes  (i.e. ,  the  critical  #ORs)  is  between  5  and  7.  Thus,  the  present  results 
indicate  that  only  a  few  additional  channels  would  be  necessary  to  adequately  model  the 
discrimination  of  more  complex,  full  gray-scale  textures.  The  critical  #OR s  estimated  here  are 
also  consistent  with  threshold  data,  which  suggest  that  there  are  5-8  orientation  channels  with 
bandwidths  of  15-30  deg  (Dannemiller  &  Ver  Hoeve,  1990;  Daugman,  1984;  Phillips  &  Wilson, 
1984). 


Although  it  is  unlikely  that  the  models  can  predict  the  specific  perception  associated  with 
a  stimulus  composed  of  many  sinusoidal  components,  the  estimated  bandwidth  of  the  presumed 
underlying  mechanisms  might  lead  to  a  prediction  about  the  global  perception  of  such  a  stimulus. 
This  estimate  might  then  be  inconsistent  with  the  actual  perceptual  response  when  the  component 
spatial  frequencies  and  orientations  are  separated  by  more  than  that  bandwidth.  Such  perceived 
changes  are  emergent  properties  of  a  particular  combination  of  components  in  the  sense  that  they 
cannot  necessarily  be  predicted  by  the  presumed  response  to  the  individual  components.  It  is 
not  clear  whether  emergent  percepts  are  related  to  the  overall  pattern  of  activity  in  the  presumed 
set  of  mechanisms.  However,  the  emergent  percepts  become  more  meaningful  if  it  can  be 
shown  that  they  change  in  a  predictable  way  with  known  changes  in  mechanism  properties  such 
as  occur,  for  instance,  across  the  visual  field. 

As  noted  above,  the  textures  used  in  the  present  experiments  were  constructed  from 
spectral  components  whose  form  is  similar  to  the  filters  used  in  some  models  of  texture 
segregation.  For  example,  the  model  proposed  by  Sutter  et  al.,  (1989)  uses  Gabor  filters  of 
various  orientations  and  spatial  frequencies.  These  filters  correspond  quite  well  to  the 
components  of  our  textures.  We  therefore  attempted  to  apply  the  Sutter,  Beck,  and  Graham 
(SBG)  model  to  the  data  obtained  from  rating  the  similarity  of  multicomponent  textures.  In 
doing  this,  we  assumed  that  the  degree  to  which  a  texture  pair  would  segregate  perceptually 
would  be  inversely  related  to  the  pair’s  rated  similarity  (i.e.,  similar  pairs  would  segregate  less 
strongly  than  dissimilar  pairs). 

Our  implementation  of  the  SBG  model  used  sixteen  Gabor  filters:  four  orientations  (0, 
45,  90,  and  135  deg)  at  each  of  four  spatial  frequencies  (2,  4,  8,  and  16  cycles  per  image  (cpi)). 
Shown  in  Figure  9  are  examples  of  the  filters  that  corresponded  to  each  of  the  four  spatial 
frequencies  at  a  single  orientation.  Examples  of  a  2  cpi  filter  at  two  orientations  differing  by 
90  degrees  are  shown  in  Figure  10.  Predictions  of  similarity  ratings  for  a  pair  of  textures  were 
derived  as  follows: 

(1)  The  original  256  x  256  texture  images  were  decimated  to  128  x  128  arrays,  and  the 
8-bit  gray-scale  was  transformed  to  the  range  -128  to  128. 

(2)  Each  of  the  16  filters  was  convolved  with  a  texture  array,  resulting  in  16  convolution 
arrays  for  each  texture  array.  The  convolution  algorithm  padded  the  texture  array  with  zeros 
out  to  the  size  of  the  resulting  convolution  array. 
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8  cycles/lmage  1 6  cycles/image 


Figure  9 

Examples  of  the  Spatial  Filters  Used  to  Analyze  the  Texture  Stimuli 
The  textures  were  256x256  pixels  and  so  the  filters  correspond  to  spatial  frequencies  of: 
2  cycles  per  image  (cpi),  4  cpi,  8  cpi,  and  16  cpi. 


Figure  10 

Examples  of  Spatial  Filters  of  the  Same  Spatial  Frequency  (2cpi) 
but  Differing  in  Orientation  by  90  Degrees 


(3)  The  standard  deviation  of  each  convolution  array  was  computed.  In  the  SBG  model, 
this  standard  deviation  is  taken  to  be  the  spatially  pooled  response  of  a  particular  filter  to  a 
stimulus. 

(4)  In  a  comparison  of  two  stimulus  textures  (one  of  them  always  a  64-component 
standard),  the  difference  in  pooled  response  between  comparison  and  standard  was  computed 
separately  for  each  of  the  sixteen  filters. 

(3)  Each  of  these  sixteen  difference  scores  was  multiplied  by  a  value  representing  the 
sensitivity  of  the  observer  to  the  particular  frequency  and  orientation  of  the  filter.  These  relative 
sensitivity  values  woe  obtained  by  interpolation  from  the  same  contrast-sensitivity  function  used 
by  Sutter  et  al.  (1989). 

(6)  Finally,  the  corrected  filter  output  differences  were  squared  and  summed.  This 
yielded  a  value  which  was  hypothesized  to  be  proportional  to  the  perceived  difference  between 
textures.  Thus,  larger  values  should  correspond  to  smaller  observed  similarity  ratings. 

The  correspondence  between  predicted  difference  and  observed  similarity  was  found  to 
be  poor.  In  an  effort  to  determine  the  source  of  the  discrepancies  between  the  model  and  the 
data,  we  applied  the  model  to  a  variety  of  simpler  stimuli,  many  composed  of  only  one  or  two 
components.  The  results  of  these  tests  highlight  some  properties  of  the  SBG  model  which 
suggest  that  the  model  may  not  be  applicable  to  multicomponent  textures. 

The  texture  segregation  models  mentioned  above  have  been  successful  in  predicting  the 
perceptual  segregation  of  particular,  widely  used,  classes  of  textures.  As  noted  above,  the 
textures  used  in  the  present  experiments  were  constructed  from  spectral  components  which  might 
be  expected  to  stimulate  the  mechanisms  postulated  in  these  models.  However,  the  extent  to 
which  the  presence  of  a  spectral  component  in  the  stimulus  leads  to  a  corresponding  output  in 
the  model  depends  upon  the  figure  of  merit  used  in  computing  model  output.  For  example,  in 
the  simple  model  of  SBG  referred  to  earlier,  the  output  of  a  filter  tuned  to  a  particular  spatial 
frequency  and  orientation  is  determined  by  convolving  the  filter  over  the  stimulus  image  and 
then  computing  the  standard  deviation  of  the  elements  of  the  convolution.  This  method  of 
computing  filter  output  has  the  property  that  stimulus  components  outside  the  bandwidth  of  the 
filter  will  alter  the  output  value.  For  example,  when  we  used  an  exact  duplicate  of  a  16  cpi/ 
0-deg  filter  as  a  simple  input  stimulus,  the  resulting  filter  output  was  large,  as  expected,  since 
stimulus  and  filter  were  a  perfect  match.  But  when  we  added  a  second  component  at  2  cpi  and 
90-deg  orientation  to  the  input  stimulus,  filter  output  was  markedly  reduced.  Thus,  the  SBG 
method  of  computing  filter  output  results  in  a  response  measure  that  is  markedly  affected  by  the 
presence  of  distant  spectral  components.  For  purposes  of  comparison,  we  used  the  same  filter 
and  the  same  two  stimuli  with  a  different  method  of  computing  filter  output,  namely,  a  simple 
element-by-element  multiplication  of  the  filter  and  stimulus  at  a  single  (0  deg)  phase.  The  result 
of  this  operation  was  virtually  unaffected  by  the  addition  of  the  2  cpi/90-deg  component. 
Moreover,  although  the  output  computation  used  by  SBG  is  phase- invariant  for  a  single¬ 
component  image,  when  we  examined  the  response  of  a  single  filter  to  images  composed  of  two 
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components  (2  and  16  cpi,  0  orientation),  the  output  of  the  filter  changed  with  the  relative  phase 
of  the  two  components. 

The  sensitivity  to  relative  phase  and  to  components  outside  the  filter  bandwidth  described 
above  may  be  advantageous  in  some  situations,  but  these  properties  make  the  task  of  applying 
the  model  to  our  multicomponent  textures  problematic.  For  example,  we  must  generate 
predictions  about  rated  similarity  between  textures  that  have  different  (randomized)  phase 
compositions.  If  the  responses  to  the  SBG  filters  are  largely  determined  by  the  particular 
relative  phases  between  components  in  an  image,  then  they  will  not  reflect  the  variables  that 
control  the  responses  of  our  observers  (e.g.,  number  of  orientations).  We  ran  the  SBG  model 
on  two  of  our  four  standard  64-component  images  as  well  as  some  of  the  comparison  images. 
Observers  see  the  two  standard  images  as  very  similar,  but  the  SBG  model  produces  a  difference 
value  much  greater  than  the  difference  between  the  standards  and  some  comparisons  that  get 
very  low  similarity  ratings.  We  conclude,  therefore,  that  the  SBG  model  does  not  adequately 
predict  the  rated  similarity  of  multicomponent  Gabor  textures  of  the  form  used  in  the  present 
study.  It  is  not  obvious  how  the  SBG  model,  or  any  other  existing  model  of  texture 
discrimination  can  be  modified  to  predict  the  discrimination  of  arbitrary  texture  stimuli.  One 
possibility,  however,  is  to  replace  the  convolution  operation  with  a  more  general  summation 
operation  that  better  reflects  the  spatial  resolution  and  two-dimensional  spatial  frequency 
characteristics  of  known  visual  receptive  fields. 
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